Search CORE

114 research outputs found

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition

Author: Khalifa Sara
Rajapakshe Thejan
Rana Rajib
Schuller Bjorn W.
Publication venue
Publication date: 23/09/2022
Field of study

Computers can understand and then engage with people in an emotionally intelligent way thanks to speech-emotion recognition (SER). However, the performance of SER in cross-corpus and real-world live data feed scenarios can be significantly improved. The inability to adapt an existing model to a new domain is one of the shortcomings of SER methods. To address this challenge, researchers have developed domain adaptation techniques that transfer knowledge learnt by a model across the domain. Although existing domain adaptation techniques have improved performances across domains, they can be improved to adapt to a real-world live data feed situation where a model can self-tune while deployed. In this paper, we present a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained model to a real-world live data feed setting while interacting with the environment and collecting continual feedback. RL-DA is evaluated on SER tasks, including cross-corpus and cross-language domain adaption schema. Evaluation results show that in a live data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in cross-corpus and cross-language scenarios, respectively

arXiv.org e-Print Archive

Enhancing Speech Emotion Recognition Through Differentiable Architecture Search

Author: Khalifa Sara
Rajapakshe Thejan
Rana Rajib
Schuller Björn
Sisman Berrak
Publication venue
Publication date: 18/01/2024
Field of study

Speech Emotion Recognition (SER) is a critical enabler of emotion-aware communication in human-computer interactions. Recent advancements in Deep Learning (DL) have substantially enhanced the performance of SER models through increased model complexity. However, designing optimal DL architectures requires prior experience and experimental evaluations. Encouragingly, Neural Architecture Search (NAS) offers a promising avenue to determine an optimal DL model automatically. In particular, Differentiable Architecture Search (DARTS) is an efficient method of using NAS to search for optimised models. This paper proposes a DARTS-optimised joint CNN and LSTM architecture, to improve SER performance, where the literature informs the selection of CNN and LSTM coupling to offer improved performance. While DARTS has previously been applied to CNN and LSTM combinations, our approach introduces a novel mechanism, particularly in selecting CNN operations using DARTS. In contrast to previous studies, we refrain from imposing constraints on the order of the layers for the CNN within the DARTS cell; instead, we allow DARTS to determine the optimal layer order autonomously. Experimenting with the IEMOCAP and MSP-IMPROV datasets, we demonstrate that our proposed methodology achieves significantly higher SER accuracy than hand-engineering the CNN-LSTM configuration. It also outperforms the best-reported SER results achieved using DARTS on CNN-LSTM.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive

Self Supervised Adversarial Domain Adaptation for Cross-Corpus and Cross-Language Speech Emotion Recognition

Author: Jurdak Raja
Khalifa Sara
Latif Siddique
Rana Rajib
Schuller Björn
Publication venue
Publication date: 13/04/2022
Field of study

Despite the recent advancement in speech emotion recognition (SER) within a single corpus setting, the performance of these SER systems degrades significantly for cross-corpus and cross-language scenarios. The key reason is the lack of generalisation in SER systems towards unseen conditions, which causes them to perform poorly in cross-corpus and cross-language settings. Recent studies focus on utilising adversarial methods to learn domain generalised representation for improving cross-corpus and cross-language SER to address this issue. However, many of these methods only focus on cross-corpus SER without addressing the cross-language SER performance degradation due to a larger domain gap between source and target language data. This contribution proposes an adversarial dual discriminator (ADDi) network that uses the three-players adversarial game to learn generalised representations without requiring any target data labels. We also introduce a self-supervised ADDi (sADDi) network that utilises self-supervised pre-training with unlabelled data. We propose synthetic data generation as a pretext task in sADDi, enabling the network to produce emotionally discriminative and domain invariant representations and providing complementary synthetic data to augment the system. The proposed model is rigorously evaluated using five publicly available datasets in three languages and compared with multiple studies on cross-corpus and cross-language SER. Experimental results demonstrate that the proposed model achieves improved performance compared to the state-of-the-art methods.Comment: Accepted in IEEE Transactions on Affective Computin

arXiv.org e-Print Archive

OPUS Augsburg

Queensland University of Technology ePrints Archive

University of Southern Queensland ePrints

Multitask Learning from Augmented Auxiliary Data for Improving Speech Emotion Recognition

Author: Jurdak Raja
Khalifa Sara
Latif Siddique
Rana Rajib
Schuller Björn W.
Publication venue
Publication date: 12/07/2022
Field of study

Despite the recent progress in speech emotion recognition (SER), state-of-the-art systems lack generalisation across different conditions. A key underlying reason for poor generalisation is the scarcity of emotion datasets, which is a significant roadblock to designing robust machine learning (ML) models. Recent works in SER focus on utilising multitask learning (MTL) methods to improve generalisation by learning shared representations. However, most of these studies propose MTL solutions with the requirement of meta labels for auxiliary tasks, which limits the training of SER systems. This paper proposes an MTL framework (MTL-AUG) that learns generalised representations from augmented data. We utilise augmentation-type classification and unsupervised reconstruction as auxiliary tasks, which allow training SER systems on augmented data without requiring any meta labels for auxiliary tasks. The semi-supervised nature of MTL-AUG allows for the exploitation of the abundant unlabelled data to further boost the performance of SER. We comprehensively evaluate the proposed framework in the following settings: (1) within corpus, (2) cross-corpus and cross-language, (3) noisy speech, (4) and adversarial attacks. Our evaluations using the widely used IEMOCAP, MSP-IMPROV, and EMODB datasets show improved results compared to existing state-of-the-art methods.Comment: Under review IEEE Transactions on Affective Computin

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

HER-2 Immunohistochemical Expression in Bone Sarcomas: A New Hope for Osteosarcoma Patients

Author: Fathy Yasmine
Khalifa Sara E.
Publication venue: 'ID Design 2012/DOOEL Skopje'
Publication date: 04/09/2018
Field of study

BACKGROUND: Osteosarcoma and chondrosarcoma, remain the most common primary bone tumours. Questions have been raised about the prognostic influence of HER-2 in bone sarcomas, but so far the results have been debatable. The her-2 expression is possibly a predictor of chemotherapy response.AIM: In this study, we investigated the extent of HER-2 expression in bone sarcomas, and attempted to correlate it with pertinent variables that will help to provide better treatment options, especially for metastatic ones.MATERIAL AND METHODS: Fifty-two cases of bone sarcomas (32 osteosarcoma cases and 20 chondrosarcoma ones) were studied for HER-2 immunohistochemical expression then correlation with all available clinicopathologic features was done.RESULTS: Most of the osteosarcoma cases exhibited membranous staining (78.1%). Strong staining was observed (score 3+) in 34.4%; while 21.9% showed moderate staining (score 2+); and 21.9% displayed weak staining (score 1+), on the other hand, no staining was detected in 7 out of 32 cases (21.9%) (score 0). As regards chondrosarcoma, the absence of staining in all examined cases was noted. Immunohistochemical HER-2 overexpression correlated significantly with osteosarcoma site with P value = 0.004, with variation relating HER-2 intensity score to the site of osteosarcoma (P = 0.051). A statistically significant negative correlation was detected between HER-2 expression and the presence of metastasis at time of diagnosis (P = 0.006), A significant correlation was also found regarding HER-2 score and presence of metastasis with P value = 0.046 as more than half of cases with no metastasis at diagnosis (17/28 cases, 60.7%) showed positive intensity score. A statistically significant correlation was detected between HER-2 expression and patientsâ€™ age (P = 0.044). Also, HER-2 expression significantly correlated to histopathological detection of fibrous tissue, with P value = 0.033. Higher scores of HER-2 expression were associated with a significantly better differentiation (P = 0.038) since detection of wide areas of osteoid were associated with higher HER-2 scores.CONCLUSION: Further research would still be needed to delineate HER-2 role being a new hope for therapeutic targeting in bone sarcoma patients, mainly osteosarcoma in contrast to chondrosarcoma that didnâ€™t express HER-2 at all

Directory of Open Access Journals

Towards Optimal Kinetic Energy Harvesting for the Batteryless IoT

Author: Geissdoerfer Kai
Jurdak Raja
Khalifa Sara
Kusy Brano
Portmann Marius
Sandhu Muhammad Moid
Publication venue
Publication date: 18/02/2020
Field of study

Traditional Internet of Things (IoT) sensors rely on batteries that need to be replaced or recharged frequently which impedes their pervasive deployment. A promising alternative is to employ energy harvesters that convert the environmental energy into electrical energy. Kinetic Energy Harvesting (KEH) converts the ambient motion/vibration energy into electrical energy to power the IoT sensor nodes. However, most previous works employ KEH without dynamically tracking the optimal operating point of the transducer for maximum power output. In this paper, we systematically analyse the relation between the operating point of the transducer and the corresponding energy yield. To this end, we explore the voltage-current characteristics of the KEH transducer to find its Maximum Power Point (MPP). We show how this operating point can be approximated in a practical energy harvesting circuit. We design two hardware circuit prototypes to evaluate the performance of the proposed mechanism and analyse the harvested energy using a precise load shaker under a wide set of controlled conditions typically found in human-centric applications. We analyse the dynamic current-voltage characteristics and specify the relation between the MPP sampling rate and harvesting efficiency which outlines the need for dynamic MPP tracking. The results show that the proposed energy harvesting mechanism outperforms the conventional method in terms of generated power and offers at least one order of magnitude higher power than the latter

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

University of Queensland eSpace

Survey of deep representation learning for speech emotion recognition

Author: Jurdak Raja
Khalifa Sara
Latif Siddique
Qadir Junaid
Rana Rajib
Schuller Björn W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual eort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated \textit{deep representation learning} where hierarchical representations are automatically learned in a data-driven manner. This paper presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER

OPUS Augsburg

Queensland University of Technology ePrints Archive

University of Southern Queensland ePrints